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ANALYSIS OF ERROR PROPAGATION IN PARTICLE 
FILTERS WITH APPROXIMATION* 

By Boris N. Oreshkin and Mark J. Coates 

McGill University 

This paper examines the impact of approximation steps that be- 
come necessary when particle filters are implemented on resource- 
constrained platforms. We consider particle filters that perform in- 
termittent approximation, either by subsampling the particles or by 
generating a parametric approximation. For such algorithms, we de- 
rive time-uniform bounds on the weak-sense Lp error and present 
associated exponential inequalities. We motivate the theoretical anal- 
ysis by considering the leader node particle filter and present numeri- 
cal experiments exploring its performance and the relationship to the 
error bounds. 



1. Introduction. Particle filters have proven to be an effective ap- 
proach for addressing difficult tracking problems [9]. Since they are more 
computationally demanding and require more memory than most other fil- 
tering algorithms, they are really only a valid choice for challenging prob- 
lems, for which other well-established techniques perform poorly. Such prob- 
lems involve (approximated) dynamics and/or observation models that are 
substantially non-linear and non-Gaussian. 

A particle filter maintains a set of "particles" that are candidate state 
values of the system (for example, the position and velocity of an object). 
The filter evaluates how well individual particles correspond to the dynamic 
model and set of observations, and updates weights accordingly. The set 
of weighted particles provides a pointwise approximation to the filtering 
distribution, which represents the posterior probability of the state. This 
approximation allows one to form estimates of the state values and hence 
track the state. 

The analysis of approximation error propagation and stability of non- 
linear Markov filters has been an active research area for several decades [13, 
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Table 1 
Greek and Operator Notation 



Symbol Definition 



II ■ II supremum norm of function h \ \h\ \ = sup^g^ 1^(2^)1 

I I ' I I osc 1 1 ^ 1 1 osc — 

\\h\\ +osc{h) (see Table 2 for definition of osc(-)) 

II ■ lltv ||P(-) - Q(-)lltv = sup{\P{A) - Q{A)\ -.AeT} 
P{Pi^t) Dobrushin contraction coefficient 

/3(P,,t) = sup{||P,,t(x„ •) - P^,t{yi, ■)\\t.;xi,y, G E^} 
X Positive integer indicating ratio of A'' to Nb 

5t binary variable indicating whether approximation occurs at time t 

eu{M), eu{G),eu,m Constants for regularity conditions 

{sk) Rademacher sequence of independent binary random variables 

')t{ht) unnormalized prediction Feynman-Kac model 

nih) For measure fj, in 'P(-E) and function h, 

^^^^^ u{dX{xi)) Density associated with distribution u G ViEi) and cr-finite measure 
A on measurable space {Ei,£i) 
rit(ht) normalized prediction Feynman-Kac model 

^iZ-p C^) Orlicz norm of a random variable Y 

for non-decreasing convex function tpp{x) — e^ — 1 
<l>t Feynman-Kac update-operator r^t ~ ^t{rit-i) 

$i_t semigroups associated with the normalized 

Feynman-Kac distribution flows, $i,t = $t o o . . . o $^-1-1 
vpt Boltzmann-Gibbs transformation 'i't{v){dxt) = ^^^^^ Gt{xt)i^{dxt) 

a^{h) Measure of oscillations for sequence of functions hk 

N 



G parameter space 

9 parameters of mixture approximation 

fc-th particle at time t 



21]. In the case of the particle filter, there has been interest in establishing 
what conditions must hold for the filter to remain stable (the error remaining 
bounded over time), despite the error that is introduced at every time-step 
of the algorithm by the pointwise approximation of the particle represen- 
tation [3~6, 8, 16, 17]. The impact of modeling errors, including incorrect 
initial distributions and short-lived model mismatches, has also been inves- 
tigated [17,27]. 

In this paper, we focus on examining the impact of additional intermittent 
approximation steps which become necessary when particle filters are imple- 
mented on resource-constrained platforms. The approximations we consider 
include subsampling of the particle representation and the generation of 
parametric mixture models. The main results of the paper are time-uniform 
bounds on the weak-sense Lp-error induced by the combination of particle 
sampling error and the additional intermittent approximation error (sub- 
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sampling or parametric). We employ the Feynman-Kac semigroup analysis 
methodology described in [4]; our investigation of parametric approximation 
is founded on error bounds for the greedy likelihood maximization algorithm, 
which was developed in [19] and analyzed in [24,29]. 

Throughout the paper, we will motivate the theoretical analysis by con- 
sidering the concrete example of the "leader node" particle filter [20], an 
algorithm (described in detail below) that has been proposed for distributed 
tracking in sensor networks. We outline two other examples to illustrate that 
the analyzed problem arises in several practical settings. 

1.1. Example 1: leader node particle filter for sensor network tracking. 
One of the major concerns in distributed sensor network tracking is balanc- 
ing the tradeoff between tracking performance and network lifetime. Sensor 
nodes are most commonly battery-powered devices, so it is important to 
limit the energy consumption, which is dominated (if the sensors are passive) 
by communication. Particle filter tracking algorithms in sensor networks fre- 
quently adopt a centralized approach, wherein the particle filter resides at a 
computation centre and measurements from remote sensors are transported 
across the network to this centre. This approach has several disadvantages. 
Centralization introduces a single point of failure and can lead to high, un- 
evenly distributed energy consumption because of the high communication 
cost involved in transmitting the data. 

Distributed algorithms, such as the distributed particle filtering algo- 
rithms proposed in [1,25], address the aforementioned problems. These al- 
gorithms decentralize the computation or communication so that a single 
fusion centre is not required. Multiple particle filters run concurrently at 
different sensor nodes and compressed data or approximate filtering distri- 
butions are shared between them. These distributed algorithms, while miti- 
gating some of the inherent problems of centralization, can be computation- 
ally expensive, because multiple nodes are required to perform computation 
throughout the entire tracking procedure. 

The leader node particle filter., proposed in [20, 30] and refined and ana- 
lyzed in [11,28], represents a compromise; it is partially distributed, in the 
sense that at any time-step only one node performs the particle filtering (the 
leader node), but this node changes over time. The setting corresponding to 
this filtering paradigm is depicted in Fig. 1. A leader node (depicted by the 
large circles) is responsible for performing local tracking based on the data 
acquired by the satellite sensor nodes (depicted by small circles). The satel- 
lite nodes have sensing capabilities and can locally transmit the acquired 
data to the nearest leader node. The leader node fuses the data gathered by 
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• • • Satellite sensor 



Fig 1 . The leader node distributed particle filtering setting 
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the satellite nodes in its neigbourhood, incorporating them into its particle 
filter. 

Significant communication energy savings can be achieved if the leader 
node follows the target, because sensor nodes only have to relay their mea- 
surements to a nearby location. This is illustrated in Figure 1; the leader 
node approximately tracks the target trajectory (depicted by the squares). 
Sensor management strategies are used to determine when to change leader 
node [28]. When this occurs, information must be exchanged so that the new 
leader node can reconstruct the particle filter. In attempting to alleviate the 
communication cost of transmitting all particle values when the leader node 
is exchanged (which can involve thousands of bits), the filtering distribution 
is more coarsely approximated, either by transmitting only a subset of the 
particles or by training a parametric model. 

1.2. Example 2: Tracking with delayed measurements. In wireless sensor 
networks, packet losses can lead to measurements arriving out-of-order to a 
node performing tracking. Incorporating delayed measurements into a par- 
ticle filter is important, because they can be highly informative and improve 
tracking performance. One of the simplest, and most effective, strategies is 
to run the particle filter again from the time-step corresponding to the de- 
layed measurement. This strategy can be hampered by the limited memory 
of most sensor network devices, which means it is impossible to store full 
particle representations for multiple time-steps. The alternative is to store 
an approximation, either a subsampled set of particles, or a parametric rep- 
resentation, for previous time-steps. When the particle filter is run again, 
it is initialized by sampling from the approximated distribution. The ef- 
fect is equivalent to injecting intermittent approximations (subsampling or 
parametric) into the particle filter. 

1.3. Example 3: Real-time tracking with computational constraints. When 
real-time tracking is performed on an embedded processor with computa- 
tional limitations, it can be important to adjust the time devoted to particle 
filter computation. For example, consider a mobile robot that employs a 
particle filter to track its position and at the same time conducts iterative 
strategic planning of its motion in order to reach a target location. The goal 
can be achieved more efficiently (in less time and with less energy expendi- 
ture) if there is an adjustment of the computational time devoted to each of 
these two tasks. The adaptive particle filter proposed in [10] and the real- 
time particle filter of [14] adjust the number of particles at each time-step 
based on an estimate of the complexity of the filtering distribution (assign- 
ing fewer particles for simple distributions). Through these schemes, the 
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accuracy of the position estimation can be preserved, but more time can be 
devoted to motion planning. The adaptation of the number of particles is an 
example of the subsampling approximation that we analyze in this paper. 

1.4. Paper Organization. The rest of the paper is organized as follows. 
Tables 1 and 2 summarize the notation that is used in the paper. Section 2 
sets up the analysis framework and outlines relevant terminology. Section 3 
presents some foundational results that serve as the basis for our analysis. In 
Section 4 we present error bounds and exponential inequalities for particle 
filters that perform intermittent subsampling, and in Section 5 we analyze 
the performance of particle filters that employ parametric approximation. 
Section 6 describes numerical experiments that illustrate the performance 
of the algorithms we analyze and the relationship to the bounds. Section 7 
discusses related work and Section 8 summarizes the contribution and makes 
concluding remarks. 

2. Analysis Framework and Terminology. We consider a discrete- 
time non-linear filtering task in which the target dynamics and observations 
can be described by the following general state-space signal model: 



Here Xt G M'^"' is the target state vector at time t, Yt G is the mea- 
surement vector, Qt and C* ^-re system excitation and measurement noises 
respectively, ft is a nonlinear system map ft : M"^- M'^^ and gt IS a non- 
linear measurement map gt : M*^"^ — > W^y . 

In order to conduct stability (error propagation) analysis, we need to 
introduce slightly more rigorous mathematical notation. Let {Et,£t), t £N 
be a sequence of measurable spaces. The target state vector evolves according 
to a non-homogeneous discrete-time Markov chain Xt with transitions Mt+i 
from Et into Et+i. We denote by X^. = X^Q-tj the historical path process 
associated with Xt- Associated with a measurable space of the form {E,£) 
is a set of probability measures V{E) and the Banach space of bounded 
functions Bb{E) with supremum norm: 



We define a convex set Osci(-E) of iS-measurable test functions with finite 
oscillations: 



(1) 
(2) 





||/.|| = sup|Ma:)|. 



osc(/i) = sup{\h{x) — h{y)\; x,y £ E) 
Osci(£;) = {h : osc(/i) < 1} 



imsart-aap ver. 2009/05/21 file: main.tex date: August 20, 2009 



ERROR ANALYSIS OF PARTICLE FILTERS WITH APPROXIMATION 



7 



For any h £ Bh{E) it is also possible to define the following: 

||/i||osc = ||/i||+OSc(/l). 

In order to simplify the representation, we define for a measure /i G V{E), 



and for the Markov kernel from (£^j_i, to {Ei,£i): 



Thus the composite integral operator from {Ei,£i) to {Et,£t), Mi^t = ^h+i ■ ■ ■ Mf, 
has the form: 

{Mi+i . . .Mt){xi,dxt) = / Mi+i{xi,dxi+i) . . . Mt{xt-i,dxt). 

2.1. Feynman-Kac models. Throughout the rest of this paper we adopt 
the methodology developed in [4] to analyze the behaviour of filtering dis- 
tributions arising from (1) and (2). This methodology involves representing 
the particle filter as an A^-particle approximation of a Feynman-Kac model. 
We now discuss the Feynman-Kac representation; for a much more detailed 
description and discussion, please refer to [4]. 

The evolution of the unconditional signal distribution in (1) is completely 
defined by the Markov transition kernel M(-, •) and the initial signal distri- 
bution fiQ-. 

(3) Pr{Xt G dxt\Xt-i = xt-i} = Mtixt.udxt) 

According to (3), the signal distribution at time t, with respect to the se- 
quence of random variables Xi, . . . ,Xt, can be written as follows 

P^,t(d(xo, . . . , Xf )) = fj.{dxo)Mi{xo, dxi) . . . Mt{xt-i,dxt) 

defining the filtered probability space 



(n = J[E^,Tt,J^oo,^^'^ 



where the family of a-algebras has the following property: J^i C J^j C J^oo for 
any i < j and = cr{'^i>o^i) We now introduce bounded and non-negative 
potential functions Gt : Et ^ [0, oo) that characterize the properties of 
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the observation process in (2). This leads to the following definition of the 
unnormalized prediction Feynman-Kac model, for ht G Bb{Et) and t G N. 



7t(/it) =E^o \^ht{Xt)]lGiiXi)j 

= / ht{Xt)\{Gi{X,)¥r,,^t{d{xQ,....xt)) 



"[0:t] 1 = 

where Ej^^ denotes expectation with respect to the distribution of an Ef- 
valued Markov chain Xt with transitions Mt- The normalized prediction 
Feynman-Kac model is then: 

ritih) = — — 

The Boltzmann-Gibbs transformation reflects the effect of the like- 
lihood function at time t on the normalized prediction model. The trans- 
formation maps the set of probability measures on Et onto itself, i.e. 
"^t ■ ^ ^ 'Pt{Et) ^ ^t(i^) G Vt{Et). For a particular measure 

^t{v){dxi) = -^—Gt{xt)v{dxt). 

This transformation is used to construct the key operator $j : V{Et^i) — > 
■p(-Ej), which is used to update the predictive posterior distribution from 
time step t — 1 to time step t: 



Thus this operator combines the fitness assessment described by the likeli- 
hood function Gt-i and the diffusion step described by the Markov kernel 
Mt 

The repeated application of this operator, ^t{'nt~i)t>i, results in the semi- 
groups ^i^t, i ^ t associated with the normalized Feynman-Kac distribution 
flows rjt- 

= $t O O . . . O 

The semigroup ^i^t describes the evolution of the normalized prediction 
Feynman-Kac model from time i to time t: 



imsart-aap ver. 2009/05/21 file: main.tex date: August 20, 2009 



ERROR ANALYSIS OF PARTICLE FILTERS WITH APPROXIMATION 



9 



It is related to Gi^t '■ Ei — > (0, oo), the composite potential functions on Ei, 
and Pi^t '■ — > V{Et), the Markov kernels from Ei to Ef. In particular, 
Gi^t is defined as the expectation of the composite potential constructed 
based on the observations acquired during time-steps i, . . . ,t — l with respect 
to the shifted chain Mj+i . . . Mf. 

. t-i 

Gi^t{xi) = / W_Gj{xj)Mi+i{xi,dxi^i) . . .Mt{xt-i,dxt), 

and Pi^t is defined by the Feynman-Kac formulae as follows: 

lE,+it^t{xt) n Gj{xj)Mi+i{xi,dxi+i) . . . Mt{xt-i,dxt) 
n,t{ht) = ^ . 

/e.+it n Gj{xj)Mij^i{xi,dxiJ^i) . . . Mt{xt-iAxt) 
j=i 

The Boltzmann-Gibbs transformation 

Gi^t{xi)r]i{dxi) 



^i,t(??j)(dxi 



'ni{Gi,t) 
and the semigroup ^i^tiVi) 

can then be represented via these two quantities. 

2.2. Feynman-Kac formulae and the Bayesian framework. It is instruc- 
tive to draw parallels between the Feynman-Kac description of the filtering 
process discussed above and the Bayesian formulation of sequential filtering. 
The integral operator, Mf(xt-i, dxf), describing the evolution of signal diffu- 
sion in (1) is most naturally related to the state transition density (assuming 
one exists): 

Mt{xt-i,dxt) = pt{xt\xt-i)dxt. 

The measurement equation (2), which is compactly described by the poten- 
tial function Gt{xt) in the Feynman-Kac framework, is directly related to 
the likelihood function Pt{yt\xt) in the Bayesian framework: 

Gt{xt) = pt{yt\xt). 

We can then see how the diffusion step within the Feynman-Kac framework 
is related to the prediction step within the Bayesian framework: 

^>t(r?t-i) = / '$t-i{m-i)idxt-i)Mt{xt_i,dxt) Feynman-Kac 
Pt{xt\yi:t-i) = / p{xt^i\yi;t^i)pt{xt\xt^i)dxt-.i Bayes 

JEt-i 



imsart-aap ver. 2009/05/21 file: main.tex date: August 20, 2009 



10 



B. ORESHKIN ET AL. 



Thus the operator $j generates the normahzed predictive posterior distribu- 
tion, r]t{dxt) = <I>t(?7j_i)(d2;t) = p{xt\yi:t-i)dxt, and the Boltzmann-Gibbs 
transformation 'i!tii]t){dxt) = p{xt\yi:t)dxt generates the normahzed poste- 
rior distribution using the update step analogous to that of the Bayes modeh 

,T, / N Gt-i{xt^i)rit^i{dxt^i) 

?7t_i) = — Feynman-Kac 

JEt-i Gt{xt^i)rit^i{dxt^i) 

Vt-\{xt-\\y\:t~\) = -, . 7 -, , Bayes 

\Et^^Vt-\\yt-\\xt-\m-\KXt-\\y\,t-2)dxt-\ 

The normahzation constant r]t-\{Gt~\) = p(yt-i|yi:t-2) can be interpreted 
as equivalent to Bayes' evidence at time t — 1, p(yt-i|yi:i-2)- We conclude 
that the Feynman-Kac formulae are directly related to the predict-update 
Bayesian recursion. The difference between the two formulations lies in the 
fact that the Feynman-Kac formulae describe the evolution of distributions, 
while the Bayesian framework describes the evolution of the corresponding 
densities (assuming that these densities exist). 

2.3. N -particle approximations. Following [4], we can define a parti- 
cle filter by developing an A^-particle approximation to the Feynman-Kac 
model. This approximation consists of N path particles: 

= (eM)o<.<t e = %i] kGi,...,N 



The particle approximation of the prediction Feynman-Kac model is defined 
as: 

1 ^ 



k=l 

where 5 is the Dirac delta function. 

The A^-tuple represents the configuration at time t oi N particles 
and resides in the product space . The particle filter then involves a 
two-step updating process: 

A r^N selection ^ mutation ^ 

The selection stage consists of selecting randomly N particles . This ran- 
dom selection is achieved by setting, with probability etGt{£,t)^ ^ = oth- 



erwise we choose a random particle with distribution y^iLi ^^^^^^^ ■ 5tk, 

and we set ^ = . During the mutation phase, each particle ^ evolves ac- 
cording to the Markov transition Mt+i. 
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2.4. Regularity Conditions. The analysis we present relies on certain as- 
sumptions about the regularity and mixing properties of the Markov kernels 
and likelihood potential functions. We adopt the same assumptions used for 
deriving time-uniform bounds in [4]. 

We define the following condition on the Markov functions: 

• (Mju™^ There exists an integer m > 1 and strictly positive number 
eu(M) G (0, 1) such that for any z > and Xi,yi G Ei we have 

Mi^i+^{x,, •) = Mi+iMi+2 ■ ■ ■ Mi+m{xi, •) > e^{M)Mi^i+miyi, •)• 

The following regularity condition is defined for the likelihood potentials: 

• (G')u There exists a strictly positive number e^iG) G (0, 1] such that 
for any t and xt,yt G Ef 



The Dobrushin contraction coefficient ((3{Pi^t) G [0, 1]) plays a key role in 
our analysis. This is defined as follows: 



If (G')u and (M)u hold, then according to Proposition 4.3.3. [4] we have 
the following estimate for the Dobrushin contraction coefficient: 



3. Foundational Results. Our analysis of error propagation relies on 
results that characterize the impact of sampling operations and the man- 
ner in which distribution discrepancies are propagated in the Feynman-Kac 
framework. In this section, we state and prove some of these foundational 
results, which build upon relationships presented in [4]. 

We first introduce some necessary notation. Denote by {fJ-k)k>i a sequence 
of probability measures on [E, £), and by {hk)k>i a sequence of (^-measurable 
functions. Denote by m{X) = J2k=i ^x*" the iV-empirical measure asso- 
ciated with a collection of independent random variables {X^)k>i with re- 



spective distributions (/ifc)fc>i- Further, define m{X){h) — J2 ^fc(-^fc) and 



Gtixt) > e^{G)Gt{yt) 



P{Pi,t) = sup{\\Pi^t{xi,-) - Pi,t{yi,-)\\tv]Xi,yi G Ei} 




N 



k=l 



fc=l 
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Let the sampling operator 5^ : V^E) — > V{E^) be defined as: 

1 ^ 

(4) s^(^)W = ^E^(^') • 



k=l 



where (^^, . . . ,^^) is the i.i.d sample from rj. With this notation, the stan- 
dard particle filter can be expressed using the recursion t]^ = {^tiVt^i))- 

3.1. Bounds on errors induced by sampling. The following result bounds 
the weak-sense Lp error induced by the sampling operator for functions 
with finite oscillations. We recall that for a measure P G V{E), P{h) = 
jj,h{x)P{dx). 

Lemma 1. Suppose P £ V{E), then for any p > 1 and an £ -measurable 
function h with finite oscillations we have 

E{|[p-s^(p)](/.)n'<c(p)i^, 

where c{p) is defined as follows: 



c{p) 



1 if p=l 

2-p/2pr[p/2] if p>l 



and T[-] is the Gamma function. 

Proof. Since E{[P - S^{P)]{h)} = P{h) - P{h) = 0, we have from the 
Chernov-Hoeffding inequality: 



F{|[P-5^^(P)](/i)| > e} < 2e ^ 

We note that 

P{|[P - S^{P)]{h)\P >e] = P{|[P - S^{P)]{h)\ > e^/P} 
and we have from the Chernov-Hoeffding inequality: 

Next we recall the following property: 

POD 

E{|[P-5^(P)](/i)|}= / ¥{\[P - S"" {P)]ih)\ > e}de 

Jo 
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And finally we obtain: 

E{|[p-5^(p)](/.)ni 



< 



F{\[P-S^{P)]{h)\ > e^/P}de 



aP{h)p{2N)-2T 



Applying Lemma 7.3.3 of [4] allows us to set c(l) = 1 instead of c(l) = 
2"i/2r[l/2] = and this completes the proof. □ 

An almost identical result applies for the A^-empirical measure m{X). 
Lemma 2 tightens Lemma 7.3.3 from [4] and extends it to include non- 
integer p (for a comparison of the two bounds see [22]). 

Lemma 2. For a sequence of £ -measurable functions {hk)k>i with finite 
oscillations and satisfying Hki^k) = for all k > 1, 



E{\m{X){h)\P}T^ < c{p)v 



a{h) 



N 



for any p > 1, where c{p) is defined as in Lemma 1. 

The proof follows the same argument as that of Lemma 1, noting that 
the condition E(m(X)(/i)) = 0, which is necessary for application of the 
Chernoff-Hoeffding bound, follows from the condition /ifc(/ifc) = 0. 

The following theorem provides a bound on the moment-generating func- 
tion of the empirical measure m{X). The result employs Lemma 1 to tighten 
Theorem 7.3.1 of [4]. 

Theorem 1. For any sequence of £ -measurable functions {hk)k>i such 
that ^k{hk) = for all k > 1 and cr{h) < oo, we have for any e 



E 



{e^v^l-WWI}<l + ea(/i) (^1 



+ 



2 



1 + Erf 



ea{h) 



V8 



Proof. We first utilize the power series representation of the exponential: 



j£|geMX)(h)|| ^ ^ i^E{|m(X)(/i)r} 

n>0 ^' 

-^E{\m{Xm\'}+eE{\m{X)m + ^ ^E {|m(X)(/.)r} 

n>2 ^' 
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Utilizing Lemma 1 we have: 



= 1 + 



N 
ea{h) 



pi'' 

+ ^ ^a"(/i)n(2iV)-"/2r [n/2] 



n>2 



+ E 

n>2 



ea{h) 



(2iV)i/2 



r[n/2] 
(n- 1)! 



N 



'2N 



'2N 



1 +Erf 



ea{h) 



Choosing e = ey N and rearranging terms completes the proof. 



□ 



The following corollary containing a more tractable variation of the pre- 
vious theorem can be useful for deriving the exponential inequalities for the 
particle approximations of Feynman-Kac models. 

Corollary 1. For any sequence of £ -measurable functions (/ifc)fc>i 
such that ^k{hk) = for all k > 1 we have for any e 



a{h) < oo 



Proof. The proof is straightforward since sup^. Erf(2;) = 1, \ — ^JtxI2 < 
and e^'^'W > 1. □ 

We note that the simplified estimate of the moment-generating function 
in Corollary 1 is much tighter than the bound in Theorem 7.3.1 of [4] for 
asymptotically large deviations e while the more complex bound in Theo- 
rem 1 is uniformly tighter over the range of e. 

3.2. Error propagation. Proposition 4.3.7. in [4] underpins our analysis 
of the stability of the semigroups ^i^f 

Proposition 1 (Del Moral [4], Proposition 4.3.7). For any < i < t, 

G V{Ei), and hf G Bb{Et) with osc(/it) < 1, respectively \\ht\\ < 1, there 
exists a function hi in Bh{Ei) with osc{hi) < 1, respectively \\hi\\ < 1, such 
that for any r]i E V{Ei) we have 



(5) 



i,t \ lose 
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and respectively 



Using the following logic for some positive function ip we can obtain a 
result intermediate between (5) and (6): 

llv'llosc = llv'll +osc(v3) = + sup \ip{x) - ip{y)\ 

x,y£E 



< 



< 



1 + 



1 + 



1 



sup 



1 



sup 

x,y£E 



ip{x) 



|^(x)| 



(fix) 



< M 



inf. 



y£E 



SUp^g^; ip{x) J 

Summarizing, we can formulate the following proposition. 

Proposition 2. For any < i < t, fii e 'P{Ei), and ht G Bb{Et) with 
osc{ht) < 1 there exists a function hi in Bb{Ei) with osc(/ij) < 1 such that 
for any rji G V{Ei) we have 



IG. 



i,t\ 



inf 



y,(^E, 



IG 



\{r]i - lii){hi)\ 



These results describe the propagation of one-step approximation error 
through the non-linear operator ^i^t- They reveal the link between the initial 
error at time i and the propagated error at time t through the properties of 
the potential functions Gi^t and the Dobrushin contraction coefficient (3{Pi^t)- 

According to Proposition 4.3.3. [4], the oscillations of the potential func- 
tions can be bounded as follows under assumptions (G)u and (M)u™^: 



IG 



\Gi+\ 



'ni{Gi,t 



> 6,(M)6™(G) 

<6-i(Af)6--(G) 



Thus Proposition 2 implies that under assumptions (G)u and (M)u™'^ the 
error propagation in the sequential Feynman-Kac filter can be characterized 
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as follows: 



l(t-i)/m\ 



(7) 




\{r]i - fii){hi)\ 



The global approximation error between the true filtering distribution 
and its A^-particle approximation, rj^ — rjt, can be related to the sequence 
of local approximation errors rjf^ — ^i{ri^i),i = 0, . . . , t [4]: 



4. Particle Filters with Intermittent Subsampling. This section 
presents an analysis of the error propagation in particle filters that perform 
intermittent subsampling approximation steps. We focus on the case where 
the number of particles N is constant and the subsampling approximation 
step always uses Ni, particles. It is relatively straightforward to generalize our 
results to other settings, where the number of particles N and the size of the 
subsample approximation may vary over time. Our main results are a time- 
uniform bound on the weak-sense Lj,-error and an associated exponential 
inequality. 

4.1. Algorithm Description. Denote by 5t a binary variable which indi- 
cates whether a subsampling approximation is performed at time-step t. In 
our analysis, we will assume that this variable is the outcome of a deci- 
sion function based on the set of particles {^tLij^i and observations yt 
at the previous time-step. Similar results could be obtained should the de- 
cision function be of a more general nature (for example, based on either 
the entire history of the particle filter, and/or the entire history 

of measurements yi-.t)- We define So{-,-) = 0, and we assume there exist 
probabilities: 



The expectation is with respect to the Monte-Carlo sampling, measurement 
noise and the possible target trajectories. The value of qt characterizes the 
frequency of subsampling approximation at time-step t. 

The subsample approximation particle filter can then be expressed as: 



t 



(8) 



E{6t} = n^t 



i} = qt 




rjf = S'' (Mr^l,)) if 5^ = 



(9) 



imsart-aap ver. 2009/05/21 file: main.tex date: August 20, 2009 



ERROR ANALYSIS OF PARTICLE FILTERS WITH APPROXIMATION 17 

Example: leader node particle filter with subsampling 

Suppose that £ = {1, 2, . . . , L} is the set of leader node labels and ev- 
ery leader node with label £ £ C has a set of satellite nodes Si that take 
measurements and transmit them to the leader node. The number of such 
satellite nodes in the vicinity of the leader node i is \Se\. The state-space 
model describing the target evolution and measurement process at every 
leader node is then a simple modification of the general state-space model 
described in Section 2: 

(10) Xt = ft{Xt-uQt) 

(11) Yi=g>{XlCD VjG5,, 

Thus the target model is the same at every leader node but the observation 
process may be different. 

For the leader node particle filter, the decision function 
St{{it-i\f=i^^t *)i is used to decide whether or not to change a leader 

node at time t. Here denotes the set of all measurements recorded at 
time t by the set of satellite sensor nodes S^^ of leader node It- The leader 
node algorithm with subsampling can be described as follows: 

(12) <^i^{r^l,) r?^ r?^ r?f if 6, = 1, 

(13) if 5^ = 

Here the implication sign ^ represents a sampling operation and the right 
arrow — > denotes the communication process. 

If = 1, the current leader node it determines the next leader node it+i 
(through a sensor management algorithm, see e.g. [28]), and calculates the 
A'^b-particle approximation to the current predictive posterior distribution, 
r]^ , by sub-sampling the output of the standard A^-particle propagation step. 
Finally, the leader node it transmits rj^^ to the next leader node it+i, which 
recovers the A^-particle approximation by up-sampling. If 5t = 0, the current 
leader node performs standard particle propagation. 

In the leader node setting, the uniform condition (G')u can be formulated 
as follows: 

• (G')^ There exists a strictly positive number eu(G') G (0, 1] such that 
for any i, t and xt,yt G Et 

Gi,{xt)>e^-{G)Gi,{yt) 
Indeed, (G)^ holds if we take e^{G) = inf mine£((G) and i^u = max 15^1 . 
Thus Proposition 2 implies that in the leader node setting and under assump- 
tions {G)i and iM)t^ the error propagation in the sequential Feynman-Kac 
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filter can be characterized as follows: 



(14) 



2-eu(M)e™^"(G) 



[{t-i)/mi 



e,(M)eir^"(G) 



KVi - IJ'i){hi)\ 



4.2. Time-uniform error bounds and exponential inequalities. We now 
analyze the global approximation error for particle filtering with intermittent 
subsampling. We first present a theorem that specifies a time-uniform bound 



on the weak-sense Lp error. 



Theorem 2. Suppose assumptions {G)u and (M)L™'' hold. Suppose fur- 
ther that ^{Si = 1} < Qu for any i >0 and < (/u < 2/3. Then for a positive 
integer x such that N = x^b, t > p > 1 and ht G Osci(-Et) we have the 
time uniform estimate 



supE{|[r?f -r?,](/iOr} 
t>o 



N 



q\!''^/x+{l-qu. 



i/p 



where the constant eum ^s^-' 



(15) 



m(2-6„(M)e-(G)) 



Proof. We begin by applying Minkowski's inequality to (8) 

1=0 

and then applying Proposition 2: 

^ E { I [^,^^,, {^^) - cD,^,,, (<&,^ (r?f ,))] {K)\'Y 



i=0 
i=0 



1 1 Gf , J. 1 1 



N 



^In the leader node setting this is given by the following expression eu,m ~ m{2 
£u(M)e;r'^"(G))/£S(M)el''"-^'^"(G). 
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Furthermore, applying (7) we have: 



^E{|[ci>,^,,,(r?f)-cl>,^,,,($,^(r?f,))] {h,)\'Y 



1=0 



< 



e,(M)e-(G) 
t 



x2^ll-e;(M)tl'"-i'(G) 

i=0 



[(t~i)/rn\ 



Next we analyze each individual expectation under the sum above. In partic- 
ular, using the structure of the algorithm defined in (24) and the definition 
of sampling operator introduced in (4) we can rewrite the terms comprising 
the sum in the following explicit way: 



(16) 



{| o S^'H^i.ir^ti)) + (1 - 5.)5^(^.,(r/ili)) - <^..(^f i)] (Z^Of 



E- 



Grouping the terms and using Minkowski's inequality again we conclude the 
following: 



E 



+ 



E {|(1 - 5,) [5^($4(^?.-i)) - ^e^ivti)] {K)\'Y . 



Adding and subtracting 6iS^^ {^£^{rj^^)) in the first term on the right and 
applying Minkowski's inequality again we have: 

(17) E{\[rJ^-^e.{vtl)]{h^)\'Y 

+ e{\6, [s''-{^eM-i))-^Uvti)] {h^)\'Y 

+ e{|(1-5,) [s''{<^e,{vti))-^Ur^f_,)] {h,)\'Y 

We see that each error term under the sum splits into three individual terms, 
describing the approximation paths the leader node algorithm can follow at 
time i. N = x^h then the A^-particle approximation after subsampling 
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can be recovered from the A^b-pai'ticle approximation without error by repH- 
cating the iVb-particle approximation x times. Thus the first term in (17) is 
zero. 

The analysis of the remaining two terms is similar. We first concentrate 
on the second term. Recall that 5i = <5i({?i_i}jLi) ^ Thus given the 

cr-algebra J-'i-i and the realization of the current measurement, ' = y- ' , 
the output of the decision rule is independent of the sampling error, 
[S^{^(>^{7](^_^)) - ^ei{vf-i)]{hi)- We exploit this Markovian nature of the 
decision rule and apply Lemma 1 to the conditional expectation rendering 
the following bound: 



(18) 



E 



PI i/p 



{|<5,[5^^($,,(r?f,))-$,^(r?f,)] {h,)f} 



St. 



■'}} 



< 



(p) i/p 



Combining the analysis results for all three terms we obtain: 



E 



We note that the expression in brackets has the form ip{qi) = q]^^{a + /3) + 
(1 — qi)^^Pa for some /3 > a > 0. For p > 1, <f{qi) has maximum at qi = (/max 
with 

_ 1 

9max — ~ 



1 + 



p/(l-p) ' 



We have that (p{qi) is non-decreasing on q^ G [0, ^max] and non-increasing on 
qi G (feax) !]• Noting that [{a + /?)/a]^^^^~^^ is increasing in p we obtain: 



> 



1 + 



a+/3 



> inf 

/3:/3>Q I _^ 



2/3. 



Thus if < 2/3 < (/max then for any i > we have the time- uniform 
estimate: 



E 



+(1 



^i/p. 
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Finally, noting [4] that: 



(19) E{^-^liM)ei-'HG)f'-'^^'^U 



m 



1=0 



eliM)e[r~^\G) 



we complete the proof of theorem. 



□ 



The result can be generalized to cases where is not an integer multiple 
of Nh, at the expense of a slight loosening of the bound. 

Corollary 2. Suppose the assumptions of Theorem 2 apply, except we 
allow any integer Ni, < N. Then for any t > 0, p > 1 and hf £ Osci(£'t) we 
have the time uniform estimate 



supE{|[ryf - ViKhtWY^' < e„,„,cVf(p) (gV^ 
where the constant eu,m is defined as in (15). 



1 1 

+ 



+ (1 - qu) 



i/p. 



1 



The corollary follows by allowing for sampling error to arise in the first 
term in (17): 



E 



and incorporating this error bound throughout the rest of the proof of The- 
orem 2. 



Corollary 3. Under the same assumptions as Theorem 2, we have for 
p G N and ht G Osci(-Et) the time uniform estimate 



(20) supE 



[m -m\{htW] < ^ [QuX'^' +(i-9«)j 
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Proof. Starting with (16), we perform a different error decomposition: 



(21) E{\[r,^ -'^>eM-l)]ih^)\y 

+ (1 - 5.) [5^($,,(ei)) - {K)\'y 



< 



p-i 



We observe tliat 6i{l — 6i) = and that if iV = x^h for integer we 
can reconstruct an A^-sample representation from the A^b sample with no 
additional error. Thus: 



Applying the same conditioning as in (18) and utilizing Lemma 1 

i/p 



E 



< 



Qicip) ^ (1 ~ Qi)c{p) 



p/2 



(22) 



^(.x^/^.(i-.)) 



i/p 



We note that x ^ 1 and qiX + {l — qi) < QuX + (1 ~ 9u) under the assumption 
Qi < Qu- The final step in the proof involves applying (19) as in the proof of 
Theorem 2. □ 



The intuitive implication of Theorem 2 and Corollary 3 is that rare ap- 
proximation events have limited effect on the error performance of the sub- 
sample approximation particle filter. The L2 error bound for the standard 
particle filter is the same as (20) of Corollary 3 taken with p = 2, except 
for the term {QuX ~^ ~ Qu))^^"^ ■ This expression thus quantifies the per- 
formance deterioration, in terms of L2 error bounds, due to the subsample 
approximation step. 
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Example: leader node particle filter: If the compression factor, x, is X = 10; 
and subsample approximations occur with probabiUty 0.1, then the deterio- 
ration of the root mean-square performance deterioration captured, in terms 

1 /2 

of bounds, by the factor (0.1 x 10 -|- (1 — 0.1)) ' , is around 40%. The com- 
munication overhead, on the other hand, represented by the total number 
of particles transmitted during leader node hand-off, is reduced by a factor 
of 10. The compressed particle cloud exchanges are most efficient in scenar- 
ios where the targets being tracked have slow dynamics and the density of 
leader nodes is relatively low (both implying rare hand-off events), but the 
tracking accuracy requirements and leader-to-leader communication costs 
are high. 

Theorem 3 below provides the exponential estimate for the probability of 
large deviations of the approximate Feynman-Kac flows associated with the 
subsample approximation particle filter. Before proceeding to Theorem 3 we 
state a technical lemma. 

Lemma 3. Let X and Y be real random variables taking values in A' C R 
and y CM. and let the joint distribution of these variables be Px,Y{d{x^y)). 
Then for any e G M we have: 

P{X + Y >e} < ¥{X > e/2} + f{Y > e/2} 

Proof. Let us define subsets Xx>y ^ Xx>y = {x ^ X : x > y ,y ^ y} 
and Xx<:y = X^^y, X^^y = {x G X : x < y,y G y}. Denote by Icond the 
indicator function, taking value 1 where the condition cond holds and 
elsewhere. Now write the explicit expression for ¥{X + Y>e}: 

F{X + Y>e}= f f l,.+^>,Px,y (d(x, y)) 
Jy Jx 

= 11 lx+y>ePxx{<^{x,y)) + I / lx+y>ePx,Y{<^{x,y)) 

Jy Jx,>y Jy Jx,<y 

Since lx+y>e < ^2x>e ou Xx>y and lx+y>e < l2i/>e ou Xx<:y we have 

F{X + Y>e}< f [ lx>e/2Pxy{d{x,y))+ f f ly>,/2Px,Y{d{x,y)) 
Jy Jx^^y Jy J Xx<:y 

<[ [ lx>e/2Px,Y{d{x,y))+ I I lj,>,/2Px,y(d(x,y)), 
and the claim of lemma follows. □ 
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Theorem 3. Suppose assumptions {G)u and (M)L™'' hold. Suppose fur- 
ther that ¥{5i = 1} < Qu for i > and < (/„ < 1. Then for any Nf, < N, 
t >0 and hf G Osci(£'j) we have 



sup 

t>o 



^ - Vt]{ht)\ >e}< (1 + 4^2^ 



JLsZ- 



g 11, 7n 



+ g„ 1 + 4\/2^ 



Proof. Using the triangle inequality in (8) we have 



j=0 



Following the methodology presented in Theorem 2 and denoting uoi 



{l-el{M)et-^\G) 



[{t-i)/m\ 



and a = ^ ^"/ff m"/i.?'* we have: 



i=0 

Using the structure of the algorithm defined in (24) and the definition of 
sampling operator introduced in (4) we obtain the following (similarly to 
Theorem 2): 



i=Q 



1=0 

i=0 
= Zi + Z2, 



where 



+ a^^,(l - 5i) [5^($.(7?f ,)) - Urili)] {hi) 



j=0 
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i=0 



Noting that 

sup P || [7?f - 7]t]iht)\ >e}< sup P {Zi + Z2>e} 
t>o t>0 

and applying Lemma 3 we have: 

sup P||[r/f - r]t]iht)\ > e] < sup F {Zi > e/2} + sup ¥ {Z2 > e/2} . 

t>0 ^ ' t>0 t>0 

Now applying Markov inequality we conclude: 

sup F{\[T]f -r]t]{ht)\ > e| < sup ple^^^i > e^^'/H +sup ple^^Za > gr2e/2| 

< sup e"^i'/2E|e^i^4 +supe"""2'/^E|e""2^4 
Next we apply the exponential series expansion 
(23) E (e^i^i ] = y ^E{Z]n 



n>0 



and use the fact that according to the following conditioning argument and 
Lemma 1 we have 



E{Zf = {E{Z^\6i = l}¥{6i = 1} + E{Zf = 0}¥{6i = 0}) 



l/n 



< 



+ 



+ 



i=0 

(1 - 6,) I [s'^iMvf-i)) - Mvti)] {K)\)y^^ 

aj^u, (P{5, = 1}E {| [5^ o S^'-miil,)) - S'^-iUr^l,))] {K 

1=0 



< a^a;i(gic(n)iV-"/2 + (1 - gi)c(n)A^-"/2) 



n/2Nl/n 



j=0 



ci/"(n) * 
1=0 
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Noting that aJ2^i^ ^n,m we have: 
i=0 



-n/2 



Substituting this into (23) and employing the same simphfications as in the 
proofs of Theorem 1 and Corollary 1 we obtain: 



^ ^ " n>0 V ^ ^' 



-£n/2 



< 



1 + 



+ E 

n>2 



Tieu,i 



_ r(n/2) 
2N J (n-1)! 



-eri/2 



1 + V2^I1^] 



Choosing ti = we have 



-eri/2 



Eje^i^i} < ( 1 + 4\/2^ 



2e^ 



Similar analysis yields 



T2eu,n 



-eT2/2 



which after choosing T2 = results in: 



g-er2/2]g|gr2Z2| < j 1 + 4 



This completes the proof. 



□ 



5. Particle Filtering with Intermittent Parametric Approxima- 
tions. In this section we analyze the error behaviour of a particle filter that 
incorporates intermittent parametric mixture estimation of the filtering den- 
sity. The probability density estimation problem consists of estimating an 
unknown probability density given the i.i.d. sample {Ci}iLi from this den- 
sity. As before, let {E, £) be a measurable space. Denote A a cr-finite measure 
on £. Throughout this section it is assumed that the underlying distribution 
has a density if its Radon-Nikodym derivative with respect to A exists. 
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We assume that with the sequence of the approximate filtering distribu- 
tions, <I>j(?7^^)(dxj), there exists an associated and well-behaved sequence of 
approximate filtering densities ^^i{rjf_i){dxi) so that the mixture density 
estimation problem is well-defined. The main result of the section, consti- 
tuted in Theorem 6, is a time-uniform, weak-sense Lp error bound character- 
izing the expected behaviour of the parametric approximation leader node 
particle filter. 

5.1. Parametric Approximation Particle Filter Algorithm. For this algo- 
rithm, the binary variable 6t now indicates whether a parametric approxi- 
mation is performed at time-step t. Again we assume that it is the outcome 
of a decision function based on the set of particles {£,t-i}k=i ^^'^ observa- 
tions Yt-i at the previous time-step. Denote by WATp : V{E) V{E^p) 
an operator that represents a parametric mixture approximation procedure 
that involves A'p mixture components (we will provide a concrete example 
below). The parametric approximation particle filter can then be expressed 
as: 

(24) r?f = 5^($,(r?ili)) if 5, = 

Example: parametric approximation leader node particle filter 
When it employs parametric approximation, the leader node particle filter 
can be represented as follows. 

^i. [riti) r/f ^ ^ — ^ ^ if S, = 1, 

Here the ^ represents the local distribution parametric approximation pro- 
cess and A'^p is the number of the components in the mixture. As before, ^ 
represents an A^-particle sampling operation and — > represents communi- 
cation between leader nodes. Thus the second particle filter we define relies 
upon a parametric approximation of the distribution ^i{ri^-^) based on the 
current particle set (a sample from this distribution). 

5.2. Parametric Approximation. Within the GML framework proposed 
by Li and Barron [19] the discrepancy between the target density / and its 
estimate is measured by the Kullback-Leibler (KL) divergence. For any two 
measures and fi on E KL-divergence can be defined as follows: 

(25) Dii^Wfi) = f log^di/ 

Je d/i 

imsart-aap ver. 2009/05/21 file: main.tex date: August 20, 2009 



28 



B. ORESHKIN ET AL. 



We will also abuse notation by writing KL-divergence for two arbitrary 
densities / and g in a similar fashion: 

(26) Dif\\g)= f logM/(^)dA(x) 

Je g{x) 

Consider the following class of bounded parametric densities: 

= \ 4>e,{^) ■ Oi e Qi,ai < inf (pe^ixi), supcpeiixi) < h \ 

where < Oj < 6j < oo and Qi C M"^' defines the parameter space, and 
inf and sup are taken over 0j and Ei. In the setting where the intermit- 
tent approximation is accomplished using parametric approximation, we are 
looking for a sequence of mixture density estimators of the filtering densities. 
We thus define the class of bounded parametric densities, t;^^. (x), indexing it 
by time-step i to emphasize that the parameterization can be time-varying. 
The approximation is restricted to a class of discrete A'^p-component convex 
combinations of the form: 

CN^,i = convNpiTCi) = Ig : g{x) = J2'^i,j(Pej{x),(j)ej G T~(-i,^o:i,j = 1, > 

[ J=i j=i 

As A^p grows without bound, C^^^i converges to the class of continuous con- 
vex combinations: 

d = conv(W,) = I (7 : g{x) = Mx)^i{dO), (/-e G 

The general framework for the greedy approximation of arbitrary 
cost functions is discussed in [29]. The particular instance of this 
more general framework is the Greedy Maximum Likelihood (GML) 
for mixture approximation (see [19]). The corresponding computa- 
tional routine, a sequential greedy maximum likelihood, associated 
with this procedure is summarized in the form of Algorithm 1. 

Algorithm 1: GML 

1 Given gi £ Ti. 

2 for k = 2 to Np do 

3 Find (j)0f, G 7i and < < 1 to maximize the function: 

N 

i^hoil) = argmax log((l - ak)gk-i{xj) + ak4>Bk{xj)) 

4 akfik j=l 

5 Let gi, = {I - al)gk-i + al<j)ei 

6 endfor 
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To link the greedy maximum likelihood maximization in Algorithm 1 to the 
minimization of KL-divergence we recall that if is a known distribution 
and /i is the KL-based fit to this distribution, the KL-divergence minimiza- 
tion problem can be written as follows (assuming that the corresponding 
densities exist): 

min DiyWp.) = min / log — dz/ 



mm 

i^eV{E) 



log — / log dz/ 



E dA(x) Je dX{x 



[ 1 d^ A 

= max / log ^ ^ . . dz/ 
ij.€V(E)Je dX{x) 

In practice z/ itself is unknown, but a sample from this distribution may 
be available. The approximation of the true expectation with respect to z/ 
above by the expectation with respect to its empirical counterpart, S^{iy), 
leads to the maximum likelihood density estimator: 



min 0(1/1111)= max Ei^log— -7— 
f^eV{E) " ' t^&V{E) d\{x 



d/i 

max ¥.oNiy\ log , , , , 
^6P(iJ) * ^dA(x) 



^ d/x 



= max log — — — (xj) 
f.ev{E) ^ ^ dA(x) ^ 



Thus the error committed by resorting to the suboptimal GML algorithm 
consists of two contributions. 

First, there is the error associated with the limitations of the approxima- 
tion class C: even the best possible ^ G C will have non-zero D{v\ if z/ ^ C. 
We will call this error the approximation error. Second, there is the com- 
bined error associated with the greedy optimization and the approximation 
of the true expectation by its empirical counterpart — we will call this the 
estimation error. In the following we analyze these errors for the one-step 
approximation and then link the results of the analysis to the overall error 
of the parametric approximation particle filter. 

5.3. Local Error Analysis. The attractive features of Algorithm 1 are 
threefold. First, the algorithm simplifies the ML density estimation proce- 
dure. Instead of facing the A'p-mixture estimation problem we only have 
to solve A'^p 2- mixture estimation problems [19]. Second, there are several 
bounds on approximation and sampling errors of Algorithm 1 in terms of 
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KL-divergence (see [19] and [24]). In this section we extend the existing re- 
sults and perform the Lp error analysis. Third, it was shown [19] that the 
performance of the greedy algorithm converges to the performance of the 
optimal mixture estimation algorithm as and Np become large. Thus if 
these conditions hold, the results obtained for the parametric approximation 
particle filter that uses GML are also applicable if other density estimators 
are employed. 

Here we state the relevant results from [19] that will be of use in fur- 
ther analysis. The following notation is introduced to facilitate presenta- 
tion. Assuming that / is a target density and g £ C we denote D{f\\C) = 
infggc D{f\\g), the least possible divergence between a target density, /, and 
a member g from the class of continuous convex combinations C. Further- 
more, assuming that the target density / is known, the analytical estimator 
gNp g (J^^ can be obtained by solving the following greedy recursion for 
i = 2 . . . Np (see Algorithm 1): 



i^toi*k) = argmax / log((l - ak)gk~iix) -h Ofe^g^ (x))/(x)dx. 

Alternatively, g^p G C^^ is an empirical A^p-mixture estimator constructed 
using Algorithm 1 based on a sample from the target density, /. The fol- 
lowing theorem (see [19]) reveals an important general property of the GML 
algorithm. 

Theorem 4 (Li and Barron [19], Theorem 2). For every gc{x) G C 
Dif\\g''^)<D{f\\gc) + ^ 



Here, 



, _ r /e0i(x)P(dg) 
dj = 4[log(3Ve) + supe^^02G0,xGi? ^og{4>e, (x)/^^^ (x))] 



an,' 

One of the consequences [19] of Theorem 4 is the following relationship 
between an arbitrary gc{x) G C and the empirical GML algorithm output 

-I AT ^ N 2 

(27) _^log5^p(x,) > -^Iog5c(x0 - 

1=1 i=l P 
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Clearly, it also follows directly from Theorem 4 that D{f\\g'^p) < D{f\\C) + 

jy' . Thus Theorem 4 establishes a strong formal argument that shows that 
the greedy density estimate converges to the best possible estimate as A'^p 
grows without bound. Similar results for the empirical estimator appear 
in [24] and [19]. 

Our next goal is to connect the existing results on the performance of the 
GML in terms of the KL-divergence to its performance in terms of Lp error 
metric. Our next result reveals the error bound characterizing the average 
performance of the GML algorithm. One of the components of the bound 
is the packing number 'D(e,7i,dj\[), which is the the maximum number of 
e-separated points in 7i (the class of parametric density functions) under the 
empirical semimetric di^f. The empirical semimetric is defined for /ii, /12 G W 
as 

1 ^ 

d%{hi,h2) = — ^(/ii(xfc) - h2{xk)f. 



k=l 



Theorem 5. Suppose g^p G C^p is constructed using Algorithm 1 and 
g^v g V{E) is the distribution associated with g^f . Suppose further that 
there exists density f associated with the target distribution F G V{E). 
Then for any h G Bh{E) with ||/i||osc < 1, P > 1, and N,Np £ N we have: 

i/p 



,2 



+{p/A)[CeJ^ ^log{l + V{e,n,dN))de \ + ^ + D{f\\C) 



1/2 



where C is a universal constant^ 



Proof. Using Pinsker's inequality, J\f - g\ < ^/2D(J\\g), [7] we have 



E 



EU I [g^H^) - f{x)]h{x)dx. . 

< WhWEi i jjg^p{x)- f{x)\dx''^^''' 
p^ i/p 



V2\e[du\\9''-Y'^} 



1/2 



^See [26] for details. 
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Now, suppose p > 2. The following decomposition can be used to analyze 
the previous expression: 

= Dm""^) - m\c) + D{f\\c) 

Denoting g* = argmin^gc D{f\\g) we have the following modification of the 
decomposition proposed by Rakhlin et al. in [24]: 



Dif\\g'^^)-Dif\\C) 



logg^p{x)F{dx) + J logg*{x)F{dx) 
1 ^ 

J log ff^p ix)F{dx) + ^ E l°g ^"^^ (^*) 



+ J^^2^oS9*{xi)-^Y.^ogg^^{x,) 

i=l i=l 
1 ^ 

+ / logg*{x)F{dx) - —Y^logg*{xi) 

i=l 

Applying (27) to the middle term we see: 

2 

Dim''-) - Dim) <\[F- S^(F)](logg^p)| + |[F - S^(F)](logg*)| + ^ 

By the definition of D{f\\C) it follows that L>(/||^^p) - D{f\\C) > and 
thus we conclude: 



Dif\\g''^)-Dif\\C) <2sup [F - 5^^ (F)] (log 5) 

gee 



+ 



24c 

iVn 



This allows splitting the effect of approximation and sampling errors by 
applying Minkowski's inequality (since p > 2): 



E 



{Dim''- 



2/p 



E 



Dif\\g''-)-Dif\\C)+Dif\\C) 



< 2E 



sup 

gee 



[F-S^iF)] (log5) 



p/2|2/P 
1 p/2 ^ 2/p ^^^2 



+ 2^ + i?(/||c). 



The next step of the proof makes use of a symmetrization argument. 
We recall the definition of the Rademacher sequence (e^) as a sequence of 
independent random variables taking values in { — 1,+!} with P{efc = 1} = 
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P{efc = —1} = 1/2. Denote by the generator of the signed Rademacher 
measure (with being the samples from fi): 



1 



N 



k=l 



Using the symmetrization lemma (see e.g. Lemma 2.3.1 in [26] or Lemma 
6.3 in [18]) we deduce: 



sup 

gee 



[F -S^{F)] (log g) 





C" ) 


• 




< 2E< 











sup 



S^{F) (log 5) 



Denoting k{x) = g{x) — l and using the fact [23] that ip{K{x)) = alog(K(x) + 
1) is a contraction^, we apply the comparison inequality (Theorem 4.12 
in [18]), observing that [•]^^^ is convex and increasing for p > 2 and k is a 
bounded function: 



sup 

gee 



5f(F) (log5) 









r=-i 



1 2 

--sup 
^ a gee 



S^{F) (alog(K + l)) 



p/2 ^ 2/p 



< -E ■ 
a 



< -E - 
a 



sup 

gee 



S^{F) (g-l) 



p/2 ^ 2/p 



-|p/2^ 2/p 



sup|5f(F)(g)| 
gee 



+ 



-E{|5f(F)(] 



Using Lemma 1 we have: 
E{|5f(F)(l)|^ 



2/p 



E 



TV 



N ^ 



1=1 



p/2\ 2/p 



< 



2c'^/P{p/2) 



^The function tp 



is a contraction if we have \ip{x) — ^p{y)\ < \x — Vs, y £ E. 
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On the other hand, we have for any g G C a corresponding £ 7i: 

N 
1 



1=1 

N 



1 



N 



< 



N 



< sup 



-t 



N 



¥{d9) 



1=1 



Thus we have 















sup|5f(F)(5)| 


) 


sup|5f(F)(5)| 


P/2| 




gee 









2/p 



The Orhcz norm [4,26] Tr^^{Y) of a random variable Y is defined, for a 
nondecreasing convex function 'il^p{x) = — 1, as 

7r^^(y) = inf{C > : W.{^p{\Y\/C)} < 1}. 

By Hoeffding's inequahty the Rademacher process S^{F){g) is sub- 
Gaussian for the semimetric djsi [26]. Using the fact that 'E{XP}^^p < 
(p/2)!7r^2 (X) (see e.g. Lemma 7.3.5 in [4] or [26], p. 105, Problem 4) we 
deduce: 



EE, 



sup 1 5f (F)(5) I 
sew 



<(p/4)!Evr^,(sup|S'f(F)(5)|) 
gen 



In addition, since {F){g) is sub-Gaussian, we have for some universal 
constant C (see Proof of Corollary 2.2.8 in [26]): 

E7r^,(sup|Sf (F)(5)|) < -^E Tb^^+^M^^de 

g£H VN Jo ^ 
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Combining the above we have: 
E - F]{h)\^f'' < V2 [-^ (2c2/^(p/2) 

b \ 2 "I 1/2 

+{p/Ay.CKj^ ^log (1 + V{e, n, d^))de j + ^ + D{f\\C) 
Finally, suppose 1 < p < 2. In this case using Jensen's inequality we have: 

2/p 



E 



{Dif\\g^^r/Y''<^{Dif\\9''^)} 



Thus the above analysis applies if we choose p = 2 and the proof is now 
complete. □ 

Corollary 4. Suppose that the assumptions of Theorem 5 hold. Sup- 
pose in addition that f £ C then we have for any p > 1: 

i/p 



E{|[g^^-F](/.)r} ' < ^ (2c2/P(p/2) 



+{p/A)\C¥.j^ 0og(l + P(e,H,d^))ffej +41og(3^(6/a))^ 

Proof. The proof follows from the fact that under the additional as- 
sumption we have D{f\\C) = 0. Furthermore, we note that under this as- 
sumption < {b/aY and 7 = 4:\og{?,^/e{b / a)) □ 

5.4. Time-uniform error bounds. In this section we present a result spec- 
ifying time-uniform error bounds for particle filters performing intermit- 
tent parametric approximation. The result links the properties of Markov 
transitions Mj(xi_i,dxj) and error bounds for parametric GML approxi- 
mation (Theorem 5) with the propagation of approximation errors through 
Feynman-Kac operators. It is based on the following observations. For an ab- 
solutely continuous Markov kernel with density pi{xi\xi-i)^ we can write [4]: 

Mi{xi-i,dxi) = Pr{Xi G dxj|Xi„i = = pi{xi\xi_i)dxi = pi}.{xi)dxi, 

where we explicitly assume that the structure of the kernel Mi can be cap- 
tured by a set of parameters i9i S C ffi"^' (these parameters may include 
the state- value Xi-i). We can further define a class M.i of such densities: 

M^ = {p^Sx,)■.^^Ge^CR''^}. 
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Thus if Mi is such that p^-{xi) G Mi and Mi C TCi then the assump- 
tion (M)!"*^ is satisfied with m = 1 and eu(M) = a/b, yielding for any 
Xi-i,yi-i G Ei_i: 



Furthermore, using the definitions of the one-step Boltzman-Gibbs transfor- 
mation and the associated Feynman-Kac operator we see that the distribu- 
tion at time i is related to the distribution at time i — 1 as follows: 



Thus for an absolutely continuous Markov kernel with p^.[xi) G Mi we can 
rewrite the previous equation with a suitable change of measure: 



This implies that for an A^-particle approximation 77^^ we have that G 

convjv(A^j) and, as grows without bound, we have '''^^^'•^ G conv(A^j). 
Thus the performance of GML approximation algorithm is determined by 
the properties of Markov transition kernel Mj(xj_i, dxj) and the class of ap- 
proximating densities TLi. In particular, for an absolutely continuous Markov 
kernel with p^. (xj) ^ Mi and a sufficiently rich class TLi, such that Mi TLi 
we have asymptotically unbiased approximation: 



The preceding discussion can be summarized in the form of a concise as- 
sumption: 

• {TL)u- The Markov kernels associated with the target dynamics are ab- 
solutely continuous and can be expressed in the form Mj(xj_i, dxi) = 
p^^{xi)dxi. The class of densities associated with Mj is defined as 
■M-i = : ^i e &i C M*^^! . For each Mi there exists an ap- 

proximation class TLi and strictly positive numbers Ou = infj>oai, 



Mi(x,_i,-) > -Mi(2/i_i,-) 



r]i = $i(r/i_.i) = ^'i_i(r?i„i)Mi 
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b^x = supj>o bi satisfying < flu < < oo such that for any i > we 
have 

Mi C Hi and hence Mi{xi^i, •) > ^Mj(?/i„i, •) 

The following result describes the analog of Theorem 2 for the case of a 
parametric approximation particle filter using the GML algorithm. 

Theorem 6. Suppose assumptions and {7i)u hold. Suppose further 
that P{(5j = 1} < (?u for any i > and < < 1. Then for any Np, N > 1, 
t > 0, p > 1 and ht G Osci{Et) we have the time uniform hound 



supE{|[r?f -r?,](/ii)r}'^'<6„ 
t>o 



N 



16 



2c^/P{p/2) 



+C(p/4)!supE / J\og{l + V{e,rLi,dN))d£ + 81og(3yi(6/a)) 

where the constant e„ is given by: 

^ 2 - (a„/fe„)g„(G) 
(a„/6„)3e„(G) ' 

Proof. Using the same argument as in Theorem 2 we have 

i/p 



{b/af 

Nr, 



^ 2-6„(M)6„(G) 
- eu(M)eJG) 



p-i i/p 



i=0 



Based on the Minkowski inequality we have the decomposition for each in- 
dividual expectation under the sum above: 



EjKof --fita-i)] (hi}\'Y 



< E 

+ e{ 



Using the same conditioning argument as in Theorem 2 and applying Corol- 
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lary 4 based on the assumption {TC)u to the second term we have: 

p~i i/p 



E 



_aiVN 
+ 41og(3V^(6i/ai)) 



.1/2 



Applying conditioning and Lemma 1 and the same conditioning argument 
as in Theorem 3 to the remaining term we have: 



l/p ^ c^/P(p) 



N 



+ q]'" ( V2 



( 202/^^(^/2) + (p/4)!CE J\og{l+V{e,m,dN))de 
QiW N \ Jo ^ 



+ 41og(3V^(6,/oO) 



1/2N 



We conclude that since qi < q^ then for any i > we have the time-uniform 
estimate: 



E 



+9y^v2 



auVN 



( 2c2/f(p/2) + (p/4)!C7supE Jlog{l + V{e,ni,dN))de 

\ i>0 JO ^ 



+ 41og(3V^(6u/au)) 



1/2 



This along with a variation of (19) for m = 1 and the fact that according to 
assumption ('H)u, eu(-^) > o-u/bu, completes the proof of theorem. □ 

The above theorem provides an error bound for the parametric approxi- 
mation particle filter (using the GML algorithm to perform approximation) 
that is similar in structure to that specified for the subsampling approxima- 
tion particle filter. The error bound consists of two distinct contributions, 
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one corresponding to the normal operation of the filter and the other captur- 
ing the impact of the parametric approximation operation. The theorem es- 
tablishes a sufficiency requirement on the sequence of approximating classes 
TCi leading to unbiased approximation of distribution flows. The requirement 
is that the Markov transition kernel must have an associated bounded den- 
sity and this density must be a member of the class TCi. This condition is 
reminiscent of the modeling assumptions that underpin Gaussian sum par- 
ticle filtering (see e.g. [12]), where the premise is that the filtering density 
can asymptotically be represented as an infinite sum of Gaussians. 

6. Numerical Experiments. In this section we present the results of 
numerical experiments exploring the performance of the leader node parti- 
cle filter. The experiments provide an example of how the subsampling and 
parametric approximation particle filters can be applied in a practical track- 
ing problem. They provide an opportunity to compare the performance of 
the two algorithms and to examine whether practical behaviour is similar 
to that predicted by the theoretical analysis. 

We adopt the following information acquisition and target movement 
models. The state of the target is two-dimensional with dynamics [11] 

Xt = Xt-i + ro{[cosipt;smipt]) + ut. 

Here tq is a constant (we set tq = 0.02) and (pt,ut are independent and 
uniformly distributed ut ~ J7[0, 1], ipt ~ J7[— 7r,7r]. Ki = 20 leader nodes 
and Ks = 200 satellite nodes are distributed uniformly in the unit square. 
A satellite sensor node j with coordinates sj = [sij,S2j] can transmit its 
measurement to any active leader node within the connectivity radius Vc- 
The connectivity radius is set to Vc = \/2log{Ks)/ Ks (note that if every node 
can be a leader node, Ki = Kg, the resultant network topology is a random 
geometric graph). We assume that any active leader node can transmit an 
approximation of its posterior representation to any other potential leader 
node. 

The measurement equation of every satellite sensor is the binary detec- 
tor [2] capable of detecting a target within radius with probability pd and 
false alarm rate pf. 



if Xte^l 
if Xt^A-j 



where the detection region of satellite sensor j is defined as = {x : 
\\x — Sj\\2 < Td}. To perform sensor selection step we use the mutual infor- 
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mation (MI) criterion [20]: 

(28) £t+i = arg max /(Xi+i,y,5+^|yf^^,-) 

Here yi-/'* denotes the entire history of measurements, and the random vari- 

able Y^_^l'^^ denotes the (potential) set of measurements at time i + 1 by the 
set of satellite sensor nodes (5f(_,_i) of a candidate leader node £t+i- The 
calculation of the mutual information in the multiple sensor framework is 
generally a computationally demanding exercise. In the binary sensor frame- 
work, the calculations can be simplified using an efficient approximation 
(see [22] for details). 

Williams et al. pointed out in [28] that the application of the one-step mu- 
tual information criterion for sensor selection can result in undesirable leader 
node bouncing (frequent, unnecessary hand-off). To prevent this, Williams 
et al. proposed a computationally demanding finite-time horizon dynamic 
program [28]. In our simulations we use a simpler randomized algorithm to 
control the leader node exchange rate. In this algorithm the current leader 
node flips a biased coin with the probability of the flip outcome being 1 
equal to A. If the outcome is 1 then the current leader node calculates the 
mutual information criterion. It then determines if the current particle rep- 
resentation should be transferred to a new leader node that is more likely 
to make informative measurements. If the outcome is 0, then no calcula- 
tions are performed. With this approach, the computational load for each 
leader node is significantly reduced and the communication overhead can 
be regulated by the choice of A. However, the value of A should be tailored 
depending on the application (as the mobility of the target increases, leader 
node hand-offs must be considered more frequently). In our experiments we 
fix A = 1/5. 

We consider two leader node particle filtering algorithms, with one em- 
ploying non-parametric approximation (subsampling) and the other using 
parametric approximation. To create a subsample for transmission in the 
non-parametric framework we use the general residual resampling scheme [8] . 
The parametric leader node particle filter is implemented using the GML al- 
gorithm with A'p components. Each component consists of a two-dimensional 
Gaussian density with diagonal covariance matrix. The mean vector and co- 
variance matrix are estimated using the particle representation available at 
the current leader node. To implement the GML algorithm we used the stan- 
dard MATLAB nonlinear optimization routine fmincon (see [22] for details 
of the implementation). 

In the following we report the simulation results obtained using the set-up 
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Fig 2. Performance (RMSE) of different fusion schemes versus time: v denotes the 
scheme with fixed leader node selected at initialization; + denotes the scheme with leader 
node selected using approximate Mutual Information (MI) criterion and non-parametric 
(suhsampling) approximation with Nb = 10; □ denotes the scheme with leader node se- 
lected using approximate MI criterion hut no suhsampling approximation (Ni, = 300); and 
o denotes the centralized scheme using the entire set of measurements from all sensors at 
every step. 
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discussed above. All results are achieved using 5000 Monte Carlo trials, and 
in each trial a new trajectory of the target is generated. 

We first demonstrate that the discussed sensor selection procedure (leader 
node exchange rule) has good information fusion properties. Fig. 2 depicts 
the performance in terms of Root Mean Squared Error (RMSE) between 
the true position of the target and its estimate using different information 
diffusion schemes. The first scheme denoted by V corresponds to the sit- 
uation when the leader node is selected at the initialization and is fixed 
throughout the tracking exercise. The second and third schemes denoted by 
+ and □ respectively correspond to non-parametric leader node algorithms 
using A^b = 10 and A^b = 300 particles for communications respectively. The 
fourth scheme denoted by o corresponds to the centralized scenario when all 
the measurements available from every sensor at every time step t are used 
to track the target. Note that the baseline particle filter uses A^ = 300 parti- 
cles^ in all scenarios (so the Ab = 300 case corresponds to no subsampling) . 
We can see from Fig. 2 that the centralized scheme has the best perfor- 
mance in terms of RMSE. However, it is only marginally better than the 
leader node scenario without compression (A^ = A^b = 300). This highlights 
the effectiveness of the leader node particle filtering method and confirms 
that leader node selection based on the approximate mutual information is 
a valid approach. Compared to the centralized scheme, the communication 
and power consumption costs are significantly decreased since only a small 
subset of nodes is activated at any particular time step. 

The leader node particle filter that uses a very small number of transmit- 
ted particles (A^b = 10) performs comparably well. This suggests that there 
are practical scenarios where a particle filter can incorporate aggressive ap- 
proximation to reduce communication overhead without incurring a signifi- 
cant penalty in tracking accuracy. The fixed leader node approach performs 
poorly, because the activated sensors only provide useful information when 
the target is nearby. As the target moves further away, the particle cloud 
approximating the filtering distribution becomes very diffuse, and tracking 
accuracy is 4 times worse than that of any of the other schemes. 

In the next set of results, we explore the approximation error, i.e. the 
error induced by both sampling and the additional parametric/subsampling 
approximations. The RMSE combines both approximation error and estima- 
tion error resulting from the inaccuracy and/or ambiguity of the measure- 
ment information. We can estimate a Root Mean Squared Approximation 

*This value was selected after experimentation with multiple values of A'' because it 
provides sufBcient accuracy without inducing unnecessary computational overhead. The 
primary purpose of the simulations is to examine the impact of the approximation steps. 
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Error (RMSAE) by calculating the error between a candidate particle filter 
and an "ideal" reference particle filter. As our reference filter, we employ a 
particle filter that uses N = 3000 particles, with no approximation during 
hand-off. For each of the 5000 Monte Carlo trials, we apply this reference 
filter to generation location estimates. The approximation error for our test 
filters is measured relative to these estimates rather than the true locations. 

Figure 3(a) depicts how the tracking performance is affected as the num- 
ber of particles in the subsampling step (A'b) changes; Figure 3(b) provides 
similar results for the parametric approximation method as the number of 
components in the mixture model (Np) is varied. The performance is mea- 
sured in terms of the RMSAE increase relative to a leader node particle filter 
that performs no approximation, i.e. uses A'^b = N = 300 particles during 
hand-off. 

Fig. 3 indicates that the performance of the leader node particle filter has 
interesting dynamic structure. In particular, in the time period t G [1, 50] we 
can see an articulated transient behaviour (see Fig. 3(a), = 10 in particu- 
lar). The transient in these curves arises because the particle representation 
of the target location density is initially highly dispersed and multi-modal, 
making it relatively difficult to approximate using either a subsampling or 
parametric method with a small number of particles/mixture components. 
However, as time progresses (t G [51, 100]) the particle representation of the 
target becomes more localized and closer to uni-modal, so approximation 
performance improves significantly. Qualitatively, the performance deterio- 
rates gracefully with respect to the extent of the compression during hand-off 
(reduction in number of particles or components), as theoretically predicted 
in the previous sections. 

For the final performance analysis, we define a compression factor as the 
ratio of the number of particles used during regular particle filter compu- 
tations to the number of values transmitted during the hand-off. For the 
subsample approximation case, this is simply N/Ny^. In our case of a Gaus- 
sian mixture, variance information is transmitted in addition to the locations 
of the Gaussians and the mixture weights, so the factor is 2N/5Np. Figure 4 
presents a box-plot depicting performance deterioration (ratio of approxima- 
tion error of the leader node with Ni, < N and the leader node with Nh = N) 
versus the compression factor. Both the median and the maximal deviations 
of the performance deterioration scale smoothly with changing compression 
factor. Parametric approximation clearly outperforms subsampling. 

For the subsampling case. Theorem 1 provides an analytical bound on the 
expected approximation error. The curve based on this result is depicted in 
Figure 4(a) and provides a meaningful characterization of the expected per- 

imsart-aap ver. 2009/05/21 file: main.tex date: August 20, 2009 



44 



B. ORESHKIN ET AL. 



formance deterioration. Indeed, the theoretical prediction closely coincides 
with the maximal performance deterioration observed for each compression 
factor. For comparison purposes, we include a bound derived based on a sim- 
ple worst-case assumption that the subsample approximation particle filter 
performs only as well as a particle filter that uses A^b particles at all times. 
The bound developed in this paper clearly provides a better indication of 
the performance deterioration. 

7. Related Work. The analysis of approximation error propagation 
and stability of non-linear Markov filters has been an active research area for 
several decades. In [13] Kunita studied the asymptotic behaviour of the er- 
ror and stability of the filter that has an ergodic signal transition semigroup 
with respect to the initial distribution. Ocone and Pardoux [21] addressed 
the stability of linear filters with respect to a non-Gaussian initial condition 
and examined the stability of non-linear filters in the case where the signal 
diffusion is convergent. The important conclusion drawn by Ocone and Par- 
doux based on results in [13, 21] is that if the signal diffusion is stable with 
respect to its initial condition then the optimal filter inherits this property 
and it is also stable with respect to the initial condition. 

Although interesting, the results in [13, 21] address the optimal filtering 
scenario, and more relevant to our study is the analysis of approximately 
optimal filters (especially particle filters). Important results concerning the 
stability of particle filters have been developed over the past decade [3-6,8, 
15-17,27]. 

The Feynman-Kac semigroup approach to the stability analysis of particle 
filters has been described and developed by Del Moral, Miclo and Guion- 
net in [4-6]. The authors study the stability properties of general non-linear 
Feynman-Kac semigroups under a variety of assumptions. The Dobrushin 
contraction coefficient of the underlying Markov chain plays a central role 
in the analysis. In [6], Del Moral and Miclo formulate the conditions for the 
exponential asymptotic stability of the Feynman-Kac semigroup and bound 
the Lyapunov constant and Dobrushin coefficient. One of the applications of 
these results is a time-uniform upper bound on the error of interacting parti- 
cle systems. In [4], Del Moral provides an extensive analysis of the properties 
of Feynman-Kac semigroups. His analysis forms the basis for our study in 
this paper, particularly in the case of the subsampling approximation par- 
ticle filter. 

Stability analysis for particle filters is frequently built on relatively strong 
assumptions about the mixing and ergodicity properties of the underlying 
Markov transitions of the signal (target state). Our analysis in this paper is 
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no exception, with the regularity conditions (M)u and {Tt)n underpinning 
our results. There have been some efforts to relax these types of assumptions. 
In [16,17], Le Gland and Oudjane study the stability and convergence rates 
for particle filters using the Hilbert projective metric. In [16], they relax 
the signal mixing assumptions by employing a specific, "robust" particle 
filter architecture with truncated likelihood functions. In [17], the mixing 
assumption is applied not to the Markov kernel governing signal diffusion, 
but to the non-negative kernel that governs the evolution of the particle 
filter. This kernel combines the effects of the Markov transitions and the 
likelihood potentials, so mixing behaviour can arise from either of these two 
components. 

The papers cited thus far addressed the analysis of particle filters with 
fixed population size (number of particles). In the subsampling approxima- 
tion particle filter analyzed in this paper, the number of particles varies over 
time. Crisan et al. examine the stability of branching and interacting parti- 
cle systems in [3]; in these systems the population size also varies, because 
at each time step a particle generates a random number of offspring. The 
population size forms a positive integer-valued martngale with respect to 
the filtration and the properties of the resulting particle filter depend on 
the initial number of particles. The variation in the number of particles is 
clearly very different from that of the subsampling approximation particle 
filter, so the results are not directly applicable. 

Thus far we have discussed previous work that has addressed particle 
filter stability when the error arises due to the sampling approximation. 
The sampling error is dependent on the resampling schemes, and Douc et 
al. have provided theoretical results that allow various resampling schemes 
to be compared [8]. Other work has considered additional sources of error. 
Vaswani et al. analyzed the performance of a particle filter in the case of 
signal model mismatch (when the true underlying Markov transitions differ 
from the model used to update the filter) [27]. They showed, using the 
same assumptions as in [17], that the particle filter is stable if the mismatch 
persists for only a finite interval of time. 

Le Gland et al. propose and analyze the kernel-based regularized particle 
filter in [15, 17], and this work is the most closely related to our study of the 
parametric approximation particle filter. From an algorithmic standpoint, 
there are also similarities with the Gaussian sum particle filter [12], but the 
theoretical analysis of this filter is less developed. The regularized particle fil- 
ters described in [15,17] incorporate a step in which the A^-sample pointwise 
density approximation is replaced by a continuous density approximation, 
using a kernel-based density estimation approach. During resampling, A'^ 
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particles are generated by sampling from this continuous density. The prac- 
tical benefit of this approach is the increase in the diversity of the particle 
system, eliminating the potential for degeneracy and improving the stabil- 
ity of the algorithm. Le Gland et al. provide uniform convergence results 
for the regularized particle filters. Although there is some similarity to the 
parametric approximation particle filter we analyze, the purpose of the ap- 
proximation is very different. It is not performed intermittently to reduce 
computation or communication cost, but rather is performed every time step 
with a complex model {N components). 

There has been some work addressing the analysis of the leader node 
particle filter [11]. Although simulation (and to some extent, experimen- 
tal) results indicate that instability effects are rarely observed in the leader 
node particle filtering, prior to our work, the theoretical bounds on esti- 
mation error for leader node particle filtering using intermittent parametric 
approximation grow exponentially over time [11]. The analysis in [11] used 
maximum log-error to model the approximation error propagation. Because 
the analysis does not take into account the structure of the dynamic system 
(and indeed imposes no assumptions on its mixing properties and regular- 
ity), the resulting bounds diverge. 

8. Concluding Remarks. We have presented the analysis of particle 
filters that perform intermittent approximation. Our main results have the 
form of upper bounds on the expected Lp error of particle filters that occa- 
sionally employ either subsampling or parametric approximations of the fil- 
tering distribution. Such approximation steps become necessary when parti- 
cle filters are deployed on resource-constrained platforms, where the resource 
can be energy, memory or computational power. The important conclusion 
of our analysis is that these approximation steps do not induce instability, 
and moreover, the frequency of the approximation steps significantly affects 
the extent of performance degradation. If the approximation steps are rare, 
then the compression can be very high (very few subsamples or very few 
mixture components) and the error remains reasonable. Numerical exper- 
iments indicate that the bound for the subsample approximation particle 
filter provides a meaningful characterization of practical performance. 
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Table 2 
Notation 

Symbol Definition 

Bb(E) Banach space of bounded functions 

c{p) Constants (c(p) = 1 if p = 1; and c(p) = 2~''/^pr[p/2] if p > 1) 

CjVp class of discrete A^'p-component convex 

combinations of densities from Ti. 
D{f\\g) Kulback-Leibler divergence for densities / and g 

D{f\\C) inf.ecDifWg) 
©(e, H, djv) packing number - maximum number of e-separated 

points in Ti. using the empirical semimetric djv 
d%{hi, /12) empirical semi-metric for /ii, /12 G Ti 

{Et, £t), t £N sequence of measurable spaces through which the target transitions 

tj^p Np component distribution associated with 

g p Np component mixture density 

empirical TVp-mixture density estimate constructed from 
A'^ samples from a target density 

Gt Et ^ [0, 00) bounded, non-negative potential functions characterizing Yt 

{G)vL time-uniform regularity condition for likelihood potentials 

H. class of bounded parametric densities 

Mt+i Markov chain transitions of Xt from Et into Et+i 

Mi^t = Mi+i . . . Mt composite integral operator from [Ei, £i) to {Et, St) 

Af(-, ■) Markov transition kernel 

(M)u™'' time-uniform regularity condition on the Markov functions 

m{X) = X/fc'-i ^x'' the A-empirical measure 

m{X)(h) Application of empirical measure to a sequence of functions hk'. 

N 

m{X){h) ^±J2hk{X,) 

fe=i 

N number of particles in standard particle filter 

Nt number of particles in subsampled particle filter 

Np number of components in mixture approximation 

osc(/i) sup(\h{x) — h(y)\; x,y £ E) 

Osci(i5) convex set of £^-measurable test functions with finite oscillations 

{{h : osc(/i) < 1}) 

AT 

Sampling operator S^{ri){h) ^ J2 H^'^) 

fe=i 

(fj.) generator of the signed Rademacher measure for distribution fj, 

Xt dx X 1 target state vector at time t (Markov chain) 

X^ — A[o:t] the historical path process associated with Xt 

Yt dy X 1 observation vector at time t 

Yt — ^[1:4] the history of observations from time 1 to t 
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Fig 3. Deterioration of performance as a function of (a) varying number of transmitted 
particles for the subsampling approximation leader node particle filter; and (h) varying 
number of transmitted mixture components for the parametric approximation leader node 
particle filter. The performance deterioration is measured as the ratio of the Root Mean 
Squared Approximation Error (RMSAE) of the candidate particle filtering algorithm with 
intermittent approximation (subsampling or parametric) to that of a leader node particle 
filter that performs no approximation (N-^, = 300J. See Section 6 for a definition of the 
RMSAE. 
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(a) Non-parametric leader node particle filter. □ de- 
notes the naive performance deterioration diaracteriza- 
tion, yWV/iVb. o denotes the proposed characterization 
captured by Theorem 2. 
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(b) Parametric leader node particle filter 

Fig 4. Box-plots showing the relationship between deterioration of approximation perfor- 
mance and compression factor. The performance deterioration is measured as the ratio of 
the Root Mean Squared Approximation Error (RMSAE) of the candidate particle filtering 
algorithm with intermittent approximation (suhsampling or parametric) to that of a leader 
node particle filter that performs no approximation (N\, — 300j. The compression factor, 
defined in Section 6, is the ratio of N to the number of values transmitted during leader 
node exchange (Nb or 2.5Np). The boxes show lower quartile, median and upper quartile 
of the 5000 Monte Carlo trials. Whiskers depict 1.5 times the interquartile range and cap- 
ture most of the extreme values, and the -f values denote outliers extending beyond the 
whiskers. 
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